~4Dgifts/toolbox/src/exampleCode/audio/sonic README The contents of this README is an article that originally appeared in the March/April 1993 Issue of the Silicon Graphics support magazine "Pipeline", Volume 4, Number 2. It is the companion to the sonic.c program example. ADDING AUDIO TO AN EXISTING GRAPHICS APPLICATION. This article will discuss some of the common issues and techniques programmers encounter when adding audio to an existing graphics application. Here, we'll look at an example program, sonic.c, that illustrates some of the techniques you will want to use - spawning a separate audio process, initializing the audio port, reading audio files, playing one-shot vs. continuous audio, mixing sounds, using stereo audio effectively in 3D programs, and dealing with CPU demands of an audio process that cause undesirable clicking. HARDWARE AND SOFTWARE REQUIREMENTS Currently, under version 4.0.5F of IRIX, audio development can be done on any Iris Indigo platform - R3000 or R4000, from Entry to XS to Elan graphics. This class of machine has standard digital audio hardware, and is able to run the Digital Media Development Option, for which this example program was written. The Digital Media Development Option is a relatively new software package which consists of Application Programmers' Interfaces (API's) for digital audio (libaudio), the starter video (libsvideo), MIDI (libmidi), the AIFF-C audio file (libaudiofile), CD and DAT drive audio access (libcdaudio and libdataudio). This package comes with the Digital Media Programmers Guide to aid in learning about the new libraries. The Digital Media Development Option requires version 4.0.5F of the operating system and will run on any Iris Indigo system. Note: Some 4D35's have the ability to produce the same quality sound as that of the Indigo. Programming audio on the 4D35's at version 4.0.5A or earlier of the operating system uses the same Audio Library that the Indigo uses. The Digital Media Development Option, unfortunately, will currently not run on a 4D35 system, as IRIX version 4.0.5F is not qualified for installation on 4D35's. For this programming example, you'll need audiodev.sw from the Digital Media Development Option, which contains the libraries for the Audio Library (AL) and the Audio File Library (AF). These subsystems comprise the Audio Development Environment, version 2.0 and are provided with the Digital Media Development Option. (These are the successors to the Audio Development Environment originally provided with the Iris Development Option, in version 4.0.1 of the operating system.) This example also makes use of some of the Prosonus sound files found in the /usr/lib/sounds/prosonus directory. These audio files originate from the audio.data.sounds subsystem from the 4.0.5F Update to the operating system. Check the output of the IRIX command versions to make sure these subsystems are installed on your machine. ABOUT THE EXISTING GRAPHICS PROGRAM The structure of the graphics program which sonic is based on is typical of Graphics Library (GL) programs and lends itself easily for conversion to mixed-model programming. Windowing and input routines are handled in the main body of the program. GL routines that need to be called once are handled in the sceneinit() function, rendering is handled by the drawscene() function, and animation is handled by the movesphere() function. The program itself models a stationary observer inside a colored cube with a sphere that flies around the inside of the cube, bouncing off walls. The observer can rotate (but not translate) the camera's viewpoint by moving the mouse to see different views from inside the cube. The left mouse button starts and stops the motion of the sphere. We'll be adding two different sounds to create a stereo audio environment for the application. The first sound will be an example of continuous audio - a sound the sphere continuously makes as it moves. This is analogous to the constant engine noise of a plane. As the sphere gets closer to the observer, its intensity increases or decreases, depending on the distance from the observer. As the observer's viewpoint rotates, the audio "location" (as perceived through headphones) will also change; that is, as the sphere passes the observer from right to left, so will the sound of the sphere. The second sound will be a one-shot sound - a sound the sphere makes under the condition of it's direction change, either from bouncing off a wall, or the user toggling the sphere's motion with the left mouse button. This is analogous to the sound of a missile hitting a target. This sound is also effected by the orientation of the observer and distance from the sphere's event. AUDIO LIBRARY BASICS The Audio Library (AL) itself divides the task of sending audio samples to the hardware into two main areas of control - devices and ports. The audio device controls the input/output volume, input source, input/output sampling rate, etc. AL functions exist to control the default audio device, however, it is considered "polite" audio etiquette to let the user control these via apanel. Apanel itself is just an AL program that controls the default audio device with a graphical user interface. It is possible for a program that asserts its own audio device parameters to modify another audio program's device settings. The Indigo's default audio device supplies up to four audio ports, either stereo or mono, for AL applications to use. An audio port is an entity that an AL program reads samples from or writes to the audio device. A port can be thought of as a queue of sound samples, where the AL programmer has control over only one end of the queue. Thus, a program that received audio input would read samples in from one end of this queue, with the audio device supplying the samples from an input source such as a microphone. Conversely, a program that generated audio output would supply data for one end of the queue, and the audio device would send the queued samples to the audio outputs, such as the Indigo's speaker, or a set of headphones. An audio program that did both audio input and audio output would require the use of two audio ports and associated queues. We'll discuss the size of this queue a little later. A port can be configured to utilize an audio sample datatype that best suits the application with the ALsetsampfmt() function. Sample data can be represented by a 2's complement integer or single precision floating point. Integer data can be 8, 16, or 24 bits wide. Sample width is controlled by the ALsetwidth() command. Sample width does not represent the maximum amplitude of the input or output of an audio signal coming into or out of the audio jacks. If this were true, one could incorrectly imply that an audio port with a sample width of 24 could have a louder dynamic range than an audio port of width 8. Instead, sample width represents the degree of precision to which the full scale range of an input or output signal will be sampled. That is, if the maximum value for an 8-bit sample is 127, or 2^7 - 1, the signal level represented by this sample could also be represented by a 16-bit sample whose value is 32767, or 2^15 - 1, or a 24 bit sample whose value is 2^23 -1. For floating point data, sample width is always the same, but having MAXFLOAT as the maximum amplitude is often impractical. The AL function ALsetfloatmax() allows the programmer to specify an appropriate maximum value for their own data when the sample format is designated to be floats. Dynamic range of the data is required to be symmetrical and centered around the value 0, so the absolute value of the minimum amplitude value is always equal to the maximum amplitude. Ports can be configured to accept either stereo or monaural sample streams with the ALsetchannels() call. Stereo sample streams are implemented as interleaved left-right pairs, where even numbered s amples represent the left channel and odd numbered samples represent the right channel. As one might expect, a stereo sample buffer will be twice as big as a mono sample buffer Array index 0 1 2 3 4 5 ------------------------- Audio Sample Array | L | R | L | R | L | R | ... ------------------------- \ / \/ A stereo sample pair to be input or output simultaneously CREATING A SEPARATE PROCESS FOR AUDIO It's nice to be able to keep audio and graphics separate. For this example program, the audio we're producing is in reaction to events being handled by the section of the program responsible for graphics. The graphics process controls the motion and position of the sphere, as well as the orientation of the observer. These are all aspects we'd like our audio display to reflect, but not control. Creating a completely separate process to handle the audio has one main benefit - it provides enough independence of the audio aspects of the application so that audio performance is not degraded when the controlling process contends for graphics resources. This independence can be achieved with the sproc() function and can be enhanced by raising the priority of the audio process through IRIX scheduling control which will be discussed later. The sproc() function is the mechanism for spawning a separate audio child process from our parent process which handles the graphics. Sproc() is nice and easy for our purposes. It says, "Create a separate process and have it start out by executing the function I tell you." The original process will continue merrily on its way. Besides a starting-point function, sproc() takes another argument which tells how the parent and child processes should share data. The sonic program starts the newly created child process at the audioloop() function. The PR_SALL argument that sonic uses tells the parent and child to share nearly everything. We're mostly interested that the parent and child processes share virtual address space and that the data in this address space is consistent between them. This means that the audio process will get to look at how the graphics process changes the values of the "interesting" variables. This also means that if either the graphics process or the audio process change the value of a variable, the other will know about it immediately. Having the variables shared becomes the mechanism of communication between the two processes. See the man page for sproc() for intimate details. In general, it is not recommended that two separate processes both call GL functions pertaining to the same connection to the graphics pipe. To avoid encouraging graphics calls within the audio process, and to establish a logical separation of the graphics and audio processes, the sproc() is called before winopen(). INITIALIZATION The main task of any AL program is to read or write audio data to or from an audio port fast enough so that the user percieves the desired audio effect without interruption, i.e. the sample queue is never completely empty. Any AL program that performs audio processing for output will have a code structure that looks something like the pseudo- code below. The elements of the pseudo-code can be seen in sonic.c. #include <audio.h> ALport audioport; ALconfig audioconfig; /* Audio initializiation */ audioconfig = ALnewconfig(); /* New config structure */ ... /* Set up configutation */ /* of audio port. */ ALsetsampfmt( audioconfig, AL_SAMPFMT_FLOAT); ... audioport = ALopenport("port name","w",audioconfig); /* Open audio port */ /* Audio main loop */ while( ! done ) { process_audio(); /* Compute samples */ ALwritesamps(audioport, samplebuffer, bufferlength); /* Output samples to port */ } /* Audio shut down */ ALfreeconfig(audioconfig); ALcloseport(audioport); /* Close audio port */ Notice that port configuration information is put into the Audio Library structure ALconfig which is then passed as a parameter to the ALopenport() function. This implies that if we wish to change to or mix different sample formats of our data, or any other aspect of the audio port's configuration, we will either need to open another audio port or convert all sample data to one common format. Choosing the size of the sample queue for the configuration of the audio port is very important in applications such as this where audio dynamics are constantly changing. The AL function ALsetqueuesize() provides the means of control here. The Audio Library currently allows the minimum queue size to be 1024 samples (or 512 stereo sample pairs) for the floating point audio port we're using in sonic. It is not a bad idea to set the size of your sample queue to be about twice that of the number of samples you are processing. This gives some leeway for audio processing time to take a little longer than expected if the audio device occasionally drains the audio sample queue too fast, but also provides room enough to send a fresh batch of samples if the queue is draining too slow. However, it is possible for audio latency to increase with a queue larger than needed. Stereo sample queues need to be kept at even lengths so that the proper sense of stereo separation will not be switched for stereo sample pairs in every second call to ALwritesamps(). Try changing the #define BUFFERSIZE to 512. Recompile and run the sonic program. You might notice that, in general, the audio sounds scratchier. This can be because the actual processing of the audio is taking longer than the hardware's playing of the audio. In other words, it's possible for the sample queue to be emptied faster than it's filled. A small sample buffer may provide lower latency between updates to the output sample stream, but you need to keep the output continuous to keep it from producing undesirable clicks. On the other hand, a larger sample buffer will increase latency and is apt to detract from the overall audio experience. Change the BUFFERSIZE to 44100 to hear the effects of high latency. Notice the stereo placement of the sphere seems choppier. A BUFFERSIZE of 4000 seems to do a good job for sonic - enough to keep the audio process busy, without detracting from the user's interaction with the application. You'll have to find your own happy medium for your own application. (Don't forget to change BUFFERSIZE back!) Thirty-two bit floats are used in this example as the base sample format to which all sample data will be converted. They provide a convenient means for multiplication without type casting in the time critical sections of the code that do the actual sound processing. Also, specifying a maximum amplitude value of 1.0 can provide for a handy normalization of all sound data, especially if some waveforms are to effect the behavior of other waveforms (like an envelope function). READING AUDIO FILES The sounds used in sonic are read in from the routine init_sound(). It uses the Audio File Library (AF) to read AIFF (.aiff suffix) and AIFF-C (.aifc suffix) audio files. To provide a common ground for some nice sounds, sonic uses sounds from the /usr/lib/sounds/prosonus directory. You should be able to change which sounds are used by editting the BALLFILENAME and WALLFILENAME #defines. The AF has two main structures to deal with. An audio file setup, which is used mainly for writing audio files, is needed to get an audio file handle with the AFopenfile() routine. For more information on audio file setups, see the Digital Audio and MIDI Programming Guide that comes with the Digital Media Development Option. An audio file contains a lot of information about the data inside. Sample format, sample width, number of channels, number of sample frames, as well as the sound data itself, are about all this example needs. It is possible to get information from an AIFF-C file that describes looping points, pitch and suggested playback rates (if they are provided in the file). See the Audio Interchange File Format AIFF-C specifications and the Digital Audio and MIDI Programming Guide for more details on what can be stored in an AIFF-C file. When reading in audio files into your program you may find it necessary to convert the sample data into a format that's better suited to your application. Most of the prosonus sounds are in 16-bit 2's complement format. Any file that is not in this format produces an error message, as an appropriate conversion to floating point for other formats was not implemented for the sake of simplicity. Since the program is dealing with point sources of audio, a stereo sound source is inappropriate. Thus, the conversion to floating point also includes a conversion from stereo to mono. In this conversion, only the left channel is used. A summation or average of the left and right channels could have been just as easy to implement as our conversion from stereo to mono. ONE-SHOT & CONTINUOUS AUDIO Many applications add audio simply by generating a system() call to the IRIX commands playaiff or playaifc utilities. For some applications this is enough to add a little bit of audio, but this approach can be limiting in that your audio is only as effective as the sound file you play. This solution can be a quick and dirty way to do one-shot audio - audio that can be triggered by a single event, like a car crashing into a tree, or the sound of a ball hitting a tennis racket - but it comes with the penalty of losing interaction with the sound. Sometimes interaction is not a concern for these types of sounds. Continuous audio is different than one-shot audio in that it describes a sound that's always present to a certain degree, like the sound of a jet engine for a flight simulator, or the sound of crickets for ambience. In an application where the audio output changes continually with the user's input, it can be convenient to approach preparing the samples in chunks of equal amounts of time. Changes in audio dynamics will happen on sound buffer boundaries and multiple sounds will need to be mixed together to form one sample stream. Processing continuous audio is fairly straightforward. Sounds can be longer than the buffer that is being used for output, therefore an index needs to be kept on where the continuous sound left off for the next time around. Looping can be achieved by a test to see if the index goes out of bounds or (even better) by a modulo function. Processing one-shot audio is similar to continuous audio, with the additional criterion that the program needs to keep track of information such as when to start the sound, when to continue processing the sound (if the one-shot sound is longer than the audio buffer), and when to stop processing the sound. Sonic defines the variable "hit" to describe if there is NO_HIT on the wall (no processing needed), if the sphere JUST_HIT the wall (start processing), or if the wall has BEEN_HIT by the sphere (continue processing). While it is the graphics process that initiates the one-shot sound by changing the state of the "hit" variable, it is the audio process that acknowledges the completion of the one-shot sound by changing the state of the variable to indicate completion SIMPLE CALCULATIONS FOR SPATIALIZATION OF AUDIO Sonic uses some _very_ basic calculations that attempt a 3-D audio simulation of the sphere/room environment. Amplitude and left-right balance relative to the observer are the only audio cues that are calculated in this example. You will notice that if the sphere is in close proximity to the observer, the amplitude of the sounds eminating from the sphere are louder than they would be if the sphere were in the distance. You will also notice that as the orientation of the observation point is changed, the left-right location of the sound changes accordingly. After a bit of playing with sonic, you may notice that your sense of whether the sound is coming from in front of you or in back of you depends on the visual cue of the sphere being within the field of view of the graphics window. It is audibly obvious that sonic does not attempt any calculations for top-bottom or front-back spatialization. With a two channel (left-right) audio display system, such as a pair of headphones, anything other than a sense of left-right balance is computationally difficult to simulate, so we'll ignore it here. The first thing we will need to calculate will be the position of the sphere, described by the coordinates <sphx,sphy,sphz>, relative to the orientation of observer, described by he angles rx and ry. In order to do this correctly, we'll need to borrow from the computer graphics concept of viewing transformations to compute which direction the observer should perceive the sound to be coming from. Using these relative coordinates we can first compute the overall amplitude of the sound to reflect the distance of the sound, and then compute the amplitude of the sound for each speaker to reflect the location of the sound in the audio field. It is the responsibility of the graphics process to update the coordinates of the moving sphere, <sphx,sphy,sphz>, and the angles describing the orientation of the observer, rx and ry. Since location in the audio display needs to correspond to location in the graphics display, we need to follow the order that modelling and viewing transformations are performed in the graphics process. In the graphics process, the GL commands rot(ry, 'y'); rot(rx, 'x'); correspond to the following matrix equation for each point that is passed through transformation matrix. (Remember that GL premultiplies its matrices!) | | | relx | | | | rely | | | = <sphx,sphy,sphz> * Rot (radx) * Rot (rady) = | relz | x y | | | 1 | | | | | | | <sphx,sphy,sphz> * | 1 0 0 0 | | cos(rady) 0 -sin(rady) 0 | | | | | | 0 cos(radx) sin(radx) 0 | | 0 1 0 0 | | | * | | | 0 -sin(radx) cos(radx) 0 | | sin(rady) 0 cos(rady) 0 | | | | | | 0 0 0 1 | | 0 0 0 1 | | | | | | | <sphx,sphy,sphz> * | cos(rady) 0 -sin(rady) 0 | | | | sin(radx)*sin(rady) cos(radx) sin(radx)*cos(rady) 0 | | | | cos(radx)*sin(rady) -sin(radx) cos(radx)*cos(rady) 0 | | | | 0 0 0 1 | | | or | | | | | relx | | sphx*cos(rady)+sphy*sin(radx)*sin(rady)+sphz*cos(radx)*sin(rady) | | | | | | rely | | sphy*cos(radx) - sphz*sin(radx) | | | = | | | relz | | -sphx*sin(rady)+sphy*sin(radx)*cos(rady)+sphz*cos(radx)*cos(rady) | | | | | | 1 | | 1 | | | | | Where sphx, sphy and sphz are the world coordinates of the sphere, relx, rely and relz are the coordinates of the sphere relative to the observer, and radx and rady describe the rotations (in radians) about the x and y axes, respectively. The overall amplitude of a sound can give some impression of a sense of distance. Each buffer of sound to be processed at any given slice of time is multiplied by a amplitude scaling value that is based on the 3-D distance of the sphere relative to the observer. That amplitude scaling value is approximated with an inverse-square of the distance from the center of the observer's head to the center of the sphere by the equation amplitude = 1.0 / (distance*distance + 1.0) The 1.0 added to the square of the distance in the denominator is to insure we get a valid scaling value between 0.0 and 1.0, even when the sphere is right on top of the observer at a distance of 0.0. Since the most common method of audio display is either a set of headphones or a pair of speakers, sonic only attempts to simulate a sense of left and right. It may be possible to simulate a sense of top and bottom, as well as a sense of front and back, perhaps with a combination of filters and echoes, however this can be computationally expensive and quite complex. Thus, for the sake of this simple example, sonic ignores these techniques for aurally simulating orientation. One way of considering how left-right orientation is perceived is to think of a listener interpretting a given stereo sound as having some angle from what the listener considers to be directly in front of them. The balance knob on an everyday stereo controls this sense of left-right placement. We'll use the term "balance" to describe the listener's perceived sense of left and right. We can think of balance being on a scale from 0.0 to 1.0, where 0.0 is full volume right ear only, 1.0 is full volume left ear only, and 0.5 is the middle point with both ears at half volume. Now we need some way of relating this 0.0 - 1.0 scale for balance to the general orientation of the sonic sphere with respect to the observer. For convenience, we can think of our sound space for computing balance to be 2 dimensional. Since we're not worrying about aurally simulating top-bottom orientation, our 3-D space for graphics can be projected into the 2-D plane for our audio, the x-z plane, where the listener's perception of straight ahead is in the positive z direction (0.5 on our balance scale), full right extends in the positive x direction (balance of 0.0), full left extends in the negative x direction (a balance of 1.0). half left/half right balance = 0.5 angle = PI/2 +z O sphere ^ / | / _/-----\_ perceived angle / | / \ / | / \ full left | |/) | full right balance = 1.0 -x ---|-----+-----o----> +x balance = 0.0 angle = PI | | |(1.0,0.0) angle = 0 \ | / \_ | _/ \-----/ | | observer at origin, (0.0,0.0) The angle that is to be interpretted by the listener and mapped to our scale for balance is the angle that is made with the vector extending from the center of the observer to center of sphere and the line that goes through both ears of the observer, (or the line z=0). A simple way of mapping this angle to our 0.0 to 1.0 scale for balance would be the arccosine function. An angle of PI radians could map to our scale for balance at 1.0 - all the way left; an angle of 0 radians could map to our scale for balance at 0.0 - all the way right. To map our vector from observer to sphere onto the unit circle required for the arccosine function, we need to normalize the vector, so the argument to the arccosine function is the distance which the normalized vector from the observer to the center of the sphere, extends in the x direction. So the equation sonic uses to compute left-right balance is balance = acos( relx/distance) / PI Other spatialization techniques with which you may wish to experiment may be some sort of phase calculation between both ears. Adding reverb or delay that interacts with the sounds can add a sense of depth to the room. Filters for top-bottom or front-back transfer functions could also be implemented. However, none of these would come without adding computational complexity and an extra tax on CPU . SENDING MORE THAN ONE SOUND TO A SINGLE AUDIO PORT Since the continuous audio is the always the first sound to be processed in sonic, no mixing is needed - samples are just copied into the buffer - but the one-shot samples need to be mixed with those that are already there. Processing the one-shot sound always comes after the processing of the continuous sound in this application. A simple mixing operation can be written by just summing the samples that are currently in the sample buffer to those that are about to be mixed in. Beware! Clipping, perceived as sharp clicks, can occur if the sum of the two samples exceeds the value for maximum or minimum amplitude. To prevent this undesirable effect, weighted averages for the samples to be summed can be used. If a maximum of two sounds will be mixed, weighting each sound by 1/2 before summation will guarantee no clipping. For 3 sounds use 1/3, etc. This guarantee does not come without a trade-off, though. You'll have to decide on yet another happy medium, this time between the clipping that a straight summation can produce and the general decrease in overall volume that weighted averages can produce. Now that that we've done all our processing, it's time to send the sample buffer to the audio port for output using the ALwritesamps() call. If there is not enough room in the sample queue for the entire sample buffer to fit, ALwritesamps() will block until there is enough space. It is possible to get feedback on the progress of the draining (or filling if the audio port is configured for input rather than output) of the queue by the audio device. The ALgetfillable() and ALgetfilled() functions can be used to give an idea of how many samples are left to go before sufficient space is available in the queue. The sonic audio process calls sginap() to give up the CPU if it needs to wait for room in the sample queue. COMMON SOURCES OF AUDIO CLICKS AND DISCONTINUITY Discontinuities in audio can arise as sharp clicks or complete dropouts of sound. In general, greater attention to smooth audio performance should be paid rather than worrying about graphics performance and frame rates. A sudden, unexpected loud click is much more irritating to an end-user than graphics that aren't going as fast as they could. Here are some common causes and suggested workaround for discontinuities in audio: 1) Audio processing is feeding the output sample buffer slower than the audio device is draining the samples. As discussed earlier, this usually happens with small sample queue sizes. Increasing the queue size for your audio port _can_ help here. Keep in mind that extensive audio processing may bog down the CPU, in which case, your audio process may never be able to keep the sample queue filled adequately. 2) Clipping from mixing sounds. The "Beware!" from the text above. See the section on "Sending more than one sound to an audio port" 3) Buffer boundaries in interactive audio. In the graphics process, motion is only as continuous as the frame rate will dictate. If your audio process is like sonic, audio dynamics can change from iteration to iteration of the sound processing. Like the frame rate in graphics, the continuity of the audio is only as smooth as the continuity of the data that changes the dynamics of the audio. This source of discontinuity tends to be more subtle. Perception of this type of discontinuity can be influenced by decreasing the size of the audio buffer. 4) Other processes are contending for the same CPU. Indigos are single processor machines and all other processes need to use the same CPU as yor audio process. The audio process can lose CPU time due to IRIX scheduling of other processes (including the graphics process). One solution is to use the schedctl() function. Upon entering the audioloop() function, sonic's audio process tries to schedule itself as a high, non-degrading priority to put its priority above other user processes contending for the CPU. To gain the high, non-degrading priority the process must be run as the super-user. Sonic continues on if the use of schedctl() fails. See the man page for schedctl() for the gory details. 5) The audio process is swapping out of virtual memory. Just because a process has a high priority doesn't mean parts of virtual memory will not be swapped out to disk. You can use the plock() command to combat this. Sonic attempts to lock the process into memory just after attempting the schedctl() command. The argument of PROCLOCK indicates that all aspects of the program are to be locked into memory if possible. Like schedctl(), plock() will only be successful if the effective user id is that of the super-user. Sonic continues on if the use of plock() fails. See the man page for plock() for the details. Not everyone that runs the program will have access to the super-user account. You can insure that users can execute the program as super-user to take advantage of a high, non-degrading priority and locking the process into memory by changing the ownership of the executable to root, and by changing the permissions to set-user-id on execution. See the man page for chmod for more details.
Source
Documentation
Audio
Reference